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0 EVALUATING 'TIE EFFECTIVENESS OF ADMINISTRATIVE TRAINING PRODUCTS 

Edward H. Behrman • ; " 'William Evans 1 . • 

Research for Better Schools ' University of Pennsylvania 

•'•it- : - 

educational 4$D, spurred by the creation of the labs and centers in the 
past ^decade/ has created a proliferation of new products in the^ educational 
marketplace. Product evaluation has emerged as an area of considerable im- 
portance in the measurement field, as both the developer and the funding agent 
seek ti) assess the worth of the emergent product. /, Although product evalua- 
tion strategies have become more sophisticated in the decade, they nonetheless 
have been generally bui^t .to assess curriculum products designed f6r student 
populations (e.g., Bloom, Hastings, $ Madaus, 1971; Gagne, 1967; Grobman, 1968; 
Scriven, 1967; Tyief, 1967). Such strategies are suggestive but less than 
helpful when attempting to evaluate other kinds of educational R§D products, 
such as those designed to train school administrators in aspects of educational 
management. Examples of this latter type of educational product include 
varied prototypes how under development, such as those designed to train ad- 

ministrators in project 1 management, curriculum selection, curriculum evalua-' 

\ < 
tion, cost-effectiveness analysis, needs assessment techniques, etc. 

- The purpose of the present paper is to make clear the distinction bfe- 
•< ■, , 

tween evaluating the ttoo, types of products and to propose one possible method 

for evaluating the effectiveness of administrative training products. In. 

doing so; the paper will ■gflso highlight the difficulties;, conflicts, and 

trade-offs encountered in" determining product effectiveness of this latter" 



group, 'Hie paper appears most useful to evaluators, and developers, 1 who 
should be able to make valid statements about product effectiveness before 

requesting continued funding for product dissemination. It may also be help- > 

. j 

ful tp .funding agents, who must make dissemination* decisions. 

We should stress at the outset that the argument identifying the evalya-r.*- 
tion of product effectiveness as a Summative rather than formative function 
is persuasive, but only to a point. Scriven (1967) has already noted -that 
th^re is no ahscilute cut-off between formative and summative phases. The for- 
mative eyaluator is^ften called upon to produce evidence of product effec- 
' tiveness in order to obtain funding for continued development or dissemina- v 
tion. Therefore, concern over product effectiveness may be justified during 
both phases of evaluation, although the shape and scope of the effectiveness- 
assessment is determined by .both the nature of tfee product and its develop; * 
mental status, ■ 

The method "of the present study involves comparison of the formative 
* ( ' * 

% J ' 

evaluation strategies eihployed ift the assessment of .four educational R£D pro- 
ducts, one being drawn from the evaluation^ of a student curriculum product* 

and three being drawn from the evaluations of Administrative training pro- 

\ . * " " . ' - 

•ducts. .These four strategies are described and then critiqued along' 8- inter- 

related dimensions: (a) the definition of product effectiveness ;,(b) cri- 
terion measures of effectiveness; (c) the determination. of users \ pre- treatment 
levels of ability on the criterion; (d) types of comparisons; (e) sample size' 
and comprehensiveness,; (f) the determination 0 of acaepjtable^ criterion perfor- 
mance; (g) confounding variables ; and (h) intervening variables. Although 
determination of product effectiveness is 'only one aspect of overall formative 
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Throughout the naper, "developers" refers to any personnel emp loved by tJie 
product development agency. 



evaluation procedures, it is the sole focus of this paper* Of course, other 
activities must precede the collection of product effectiveness {lata such 
as "debugging 1 1 content and instructions. 

_The data presented were gathered <$s part of on-going evaluation efforts 
at one of the educational r labs • To g^ve the reacler a better idea of the con- 
text in which this paper deals, let us briefly describe each of these pro*; 
duct evaluations. The student curriculum product is an individualized science 
curriculum. .The administrative training products include orfe designed to 

•train school managers in proposal development-; another designed to train school 

* 

managers in curriculbm evaluation; and the revised prototype of the curriculum 
e validation product. 

. < TT.E FOUR EVALUATION STRATEGIES r \ J 

1. Student curriculum: science. Students received instruction in the 
regular school setting .throughout one academic year.*; A prete'st/posttest,, 

^comparison group design was used to determine how well students in the indi- 

«. > * 

vidualized curriculum achieved two, prespecified goals of the product (science 

achievement and attitude toward science). Students from three* pairs of 

matched schools (n = 636 for pretest and 615 for posttest) were administered ' 

the developer-cQnstructed^achievement and attitude measures in the fall and 

again in the, spring. The same form of each measure wi^s used in pre- and ( 

posttesting. Students were tested as individuals but scores were- grouped 

* *" ^ * 

together^ by grade level ^and by school for Vlata analysis. *Both gain scores 

within schools and comparison scores between schools, were computed (see 

I'-vans, 1973). 



2 ' Administra tive training: .proposal development. Sdiool administra-* 
tors from six districts were trained on-site in the district by -another num- 
ber of their own staff over a three -week' period. ' A pretest/posttest, single 
group design was employed to evaluate whether administrators" achieved the pre- 
specified goals .of the product (ability to manage a proposal development^ ro- 
ject). Content -recall was measured by 10 , multiple.-choice items for each- les- 
son, with the same set of items serving as both pre- and pos'ttests. In 
addition; simulation tesTs>f performance - actually exercises contained with- 
in the instructional lesions -- were used as additional, posttest-only measures 

♦ 

These*-simulations required learners to apply'what they had learned to a hypo, 
thetical situation of the developer's creation. Significant gain scores on - 
the -mastery tests and subjective judgments by developers of quality work on * 
the simulations were indices" of acceptable criterion performance. Each of 

ail scores werp 



the 35 administrators was tested as an individual and the 
combined into a single group for data analysis (see ■Efraris] Note 1): 

3 - Administrative training: ' curriculum evaluation. ' School administra- 
tors from '.two districts were trained on-site in, their own districts by the 
developer during two-day sessions. A\posttest only, single group design wa^ 
used to determine i£ administrators "could demonstrate achievement of the pre- 
specified product goal, (ability to initiate, plan, and monitor a curriculum 
evaluation project). Simulation tests of performance comprising exercises 
and- worksheets included in the instructional materials were reviewed by de- 
velopers to yield subjective ratings^ of quality performance. These simula- 
tions required learners to apply what they had learned to a hypothetical 
situation of their own creation. Six administrators from each district com- 



prised the sample (n = 12).. Since administrators from the same district ' 

'- i_ • 

worked, collaborately to conplete exercises and worksheets , N pnly group perfor- 

r* « »' * 

mance was scored (effective n = 2) (see Behrman, Note 2). 

• t • ' ■ , ' > ■ 

4 - Administrative training Qurriaulim evaluation (revised prototuve) . 

Using the revised prototype, administrators were-again trained how to manage 

a curriculum evaluation project. This time, though, they trained themselves 

on-site in their districts without developer support. Because the exercises 

now required the administrators to apply what they had learned to an actual, 

on-going evaluation project,, training continued intermittently over several 

months. .Again, a posttest only, single group design was. used to determine if 
' . . 'ft' 

administrators could attaift the prespecified goaU Performance was reviewed 
as in the earlier prototype, except that acMi^stratdrs -applied learning to 
real rather than hypothetical situations: thus the t^yts were work samples 
rather than simulations. Thirty-nine administrators from four districts 
(working on seven separate projects) conprised the sample! Since project 
groups* worked collaborately , x only group performance was scored (effective n 
= 7)- (see Behrman, Note' 3). , ' » . - i 

• ••■ >> . - • 

- The four evalu^t iohi described above Represent a fairly wide range of 
strategies. The' evaluation of the science product ^ems j^be a rather typi- 
cal assessment 'gf studert curriculum product effectiveness (with "the p&ssible 
Exception of employing specially-developed instruments). It, is weil-suitecT 
for the one-way ANOVA design and contains' sanples- laVge enough to*peiTjiit 



powerful statistical inference. Generalisations to similar students in simi- 

lar schools are possible. 

... "\ * / •„ 

Sudt is not the- case with the three administrative" training products', 

which .do not fit so nefa'tly into the ^traditional evaluation model. ]hat they > 

fail to is not necessarily an indictment against the quality of these latter 

evaluations; in fact, it is a goal of this paper to show why attainment of 

• < 
the traditional design is so difficult (and may not even be desirable) when 

i ' » 

evaluating administrative training products . Therefore, the next step' in our 

discussion is to critique the evaluation^ strategies described above along ' 

each of eight interrelated dimensions 

* » * 

CRITIQUE ALONG EIGHT DIMENSIONS 
Definition of product effectiveness . Invariably, effectiveness was'de- 
fined as* the match between prespecified product goals and observed learner 1 1 
performance; that is, air four evaluations were expilititly of a "goal-full 11 

* t 

.rather than M goal- free 1 f - nature. The science curriculum specified its goals , < 

in ratlier broad terms v (science achievement and science attitude). While the 

administrative products also specified broad goals, these were analyzed into 

{ ' * 

sub-goals &r objectives for measurement purposes. For example, the curricu- 

*lum evaluation product- measured the following objectives for units (or, tasks) 

* * & 

1 and 2: ♦ 

1. ability to construct ,an evaluation purpose statement 

» 2. *ab,ility to develop an oyerall evaluation* design 

' . • " ' ' S > \ 

3,. ability to sbecify evaluation instruments and subjects \ 

We might observe that the more narrow specification of. sub-goals is merely 
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the result of an apparent behavioral bbjectives approach followed by the ad- 
mihistrative product evaluator* ♦ This statement, while true, describes ratW 
than explains. Administrative products may have to use more Specific sub- ' 
goals because their dontent areas ^are less universally known, "Science 
^achievement (in first grade)" is likely .to elicit more common definition .than 
"ability to manage an evaluation/ 1 What skills are needed to manage *a ciirricu- 
lum evaluation? €an we assume that they are well-known and agreed-upon?- <> 
Probably not; thus the evaluator .of .an. administrative 'training product needs 
to subdivide the content area into discrete parts and then measure attainment 
of the p^rtfc. He should also offer a convincing argument that the user who. 
can perform successfully on eaoh of the sub-tasks has in fact performed suc- 
cessfully on the -overall taskf he should show that the sum of the parts equals 
a whole. That is, the administrative product evaluator may need to establish 

either judgmental or empirical validity for his measures, unlike the student 

» • * 

product evaluator, whose measures may already .have established validities • 
I'tirthermore, the administrative evaluator, may need to show, in addition to the < 
suftTof the parts equalling a,whole, .that the whole is'someh<iw worthwhile 
that this prespetified goal is., desirable, * Why is it; important for school ad- 
ministrators to manage* curriculum evaluation projects? Aren ! t evaluation " 
specialists supposed to do this? However, it sfeems ^unlikely that a critic 
would ask, Why is it important for elementary students' to learn sciende? 

In other words, v^hile product effectiveness was defined in all four c^6es 
as the match between goals and*learner performance, the definition (and promo- 

tion) of these goals appears more difficult, far the administrative product 
* * 

(^valuator. . • - 
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Criterion measures of effectiveness. Bloom (19S6) describes cognitive 

objectives along a sequential taxonomy, beginning with knowledge apjd then 

followed by comprehension, application, analysis, synthesis', .and evaluations ' 
i » 

Both content "mastery" (i^e., knowledge and comprehension) and application 
are represented by t;he.-criterion measures' used in the four .evaluations de- 
scribed above. The science curriculum measured knowledge and comprehension 

> 

via an achievement test. The proposal development product measured knowledge 
(but not comprehension) via an achievement test and measured application to 
a hypothetical situation provided by the developer via a simulation. The 
first curriculum evaluation product measured application to a hypothetical 
situation created by the- learner via simulations. 1 And the revised ^curriculum * 
evaluation product measured application to .an actual situation via work sam-* 
plcsT. We may conceptualize these measures of the criterion graphically, as 



in Figure 1 t l: 



- > FIGURE 1 ' 
Criterion Measures of liffectiveness 
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In other words,,, each ^measure 4 corresponds to a different point along the 

a * • * * 

Continuum. The dilemma of the administrative product evaluafor is this: if 

he attempts" to 'measure effectiveness via an achievement test, he is subject 

to the criticism that an administrative training product should produce more ' 

than "paper-and-pencil mastery" of cognitive content; on the other hand, if 

he attempts to measure effectiveness via a performance test • (simulation or 

work sample) and results axe negative, he will be unable to say whether poor 

performance data is duetto leamersj; failure to master cognitive content, the 

failure to translate mastery to performance or both. Another dilemma arises 

when choosing between simulation and work sample: while a simulation is- niore 

^controllable, work samples may offer more realistic (and hence more valid) - 

measures of performance. * * . 

Naturally., the way in which, product goals are written may guide the' 

e valuator'. tav^rd the mo'st appropriate measure. Is the adjiunistrator being 

trained to "master the principles of proposal 'development" or to "apply the 

principles of proposal development to an on-going project in his distrf^ 7 " 

Sometimes, however, product goals are so general that any of the four 

s-ures above' could be used. The question is: Which one? Student curriculum 

K 

evaluators are usually spared from this decision. We rarely ask a student to 1 

build a*battery, just identify parts # of a battery in a'picture. 

" User's pre*- treatment level of 'ability .' The science evaluator adminis- 

V 

tered identical forms as pre- and posttests otf learner ability. The proposal 

development evaluatot also administered identical forms as pre- .and posttests. 

/ 

In neither evaluation of the curriculum evaluation product were users 1 pre- 
treatment level of ability determined^ apparently for two reasons: (a) the 




mea- 
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criterion measures were highly 'idiosyncratic to the instructional materials, 
and it may have been unfair . to pretest users unfamiliar witli the termihology ' 
'and organization 'presented in the materials; (b) as the focus was oA group 
rather tfyarf individual performance, pre-measures of individual ability were 
irrelevant. -* 

Let us examine each of these reasofts<for failure to obtain pre -measures. 2 

f * 
That the measures- wp re idosyncratic may be reflective of the product itself 

-- that-it proposes terminology and organization different from those in popu- 

f& ° 
seem unfair'to study whether learners have 

-certain knowl|dge or skills,' no 'matter ityiat tenns o^^pcedures they use. 
The task 6f t the evaluator is to develop a measure of criterion performance 
.thdt is independent of the product (that *rs ,„free of its idosyncratiG termin- 
ology'arid organization). The second problem how to measure group perfor- 
mance is^more perplexing, especially if the'ireasure employed is ar^ on-the- 
job work; sample. Since the group in' training mav not 'have worked together 
as uL-gloug before the training, any pre -training work samples collected may 
have been* produced by a different set of individuals (i.e., a'different 
"subject") 'in' the same district. \ . 

Types of comparisons'. In gener&l, three types of comparisons -may be 
useful in determining product effectiveness: (1) pre- vs. post -measures , * 
(2) observed performance vs. desired performance, and (3) treatment group .vs. 
comparison group. 



2 • 4 
We say "failure" because these evaluations did not employ control groups 

*cithb,r. . * 
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Comparisons between pre- and post-measures yield "gain* scores 11 which, 
cannot be considered trdatment^effeets without benefit of a comparison*-" . 
grotfp or oth&r control.' Similarly / discrepancies between observed and de- 

sired performance are' ifcrely descriptive without; the experimental control 

* K * „ \ . " • 

provided by comparison groups. Thus, the identification! -of .'comparison 

groups is on assignment ofv paramount, import to. the^evaluatof who wishes to 

make inferences- of cause -and- effect • / ' \ " : " 

Note-, though; that ^comparison groups were used in none of the evaluations 

of administrative trailing products. Why* is this so? * 

* K Mien evaluating performance of school administrators, we often are* less 

concerned with individual gains than with organization gains We want to 

knq^, How much h^s the school district improved in its v ability to manage^ 

budge'tj train staff ; evaluate programs; bring ift new mofties ; etc.? ^Further^ tC 

more, w.e want to know whether the intervention (an 'administrative training 

product) nas. -accounted for this improvement. Thus the focus is on organiza- 
♦ . ••• k * ' x* . v . * 



tipnal peprfbrmanee . So the conparijs^rrgroup must be composed of similar 
organizations - w but similar in what ways? On what organizational yariables 
should we "match 1 ' school districts ^Jwpil enrollment, location, orgahiza- • 
tional climate, organizational structure, personal characteristics of adminis- 
trators? Without empirical evidence' to shew which variables moderate prgani- 
zational performance on a given tdsk, there is no guide for the administra- 
-tive'product evaluator to follow. \ \ 

* Sample size and comprehensiveness . The student curriculum evaluator c 
generally has availabl€ to him a iar^e student population using the* new ; pro- 
duct under*, field test condition's. He may include all* fiol^d test sites or 



sample from them, but either way the sample size is usually large enough to 
permit powerful statistical tests. _ Furthermore, the sample can be described 
in demographic terms (urban/rural, minority representation, SES,etc») and 1 ' 
can be. shown to be representative of a larger population (i.e., the projected 
users of the product). A sample that is both large and representative may 
-be called comprehensive. . • 

• Oi the other hand, the % administrative' product evaluatbr does not have 
available to "him a field test sample 'of thousands, or even hundreds, or pro- 
spective, users. A district may have 20,^000 students, but it has only one ' 
superintendent.*' And" it is often more dit^i cult to persuade a superintendent 
or other* district-level admin istrator~$p. par^cipate-in an administrative pro- 

duct tryout than it 'would tle.;to persuade him' to J;oltfeteer a sizable portion ' 

^> '-"*'' ' " '' v";*. ^„ • • 

of his 20,000 pupils for^stttdeht' cufi^culum product fryxalt. ^District par- 

'..**♦. ' ' ' • * 

ticipation in a student tryout may yield a sample of 500; district participa- 
tion in an administrative tryout may yield ah n of 1, 2y, ^or 3. Thus the job^' 
of the* 'administrative. evaluatbr can become overahelming singly to execute so 

basic a task t as identifying a /sample of adequate size. 
•* 

If the administrative product is to 0 be used in workshop mod£, it may be 
possible t;o involve 20 <or 30 users iji a'single workshop* If, on the other 
hand, the product must lye \ised on-site, a one -day. tryout- may involve but a , 
few users. In such -a case, multiple tryouts must 'be. scheduled just. to include 
as many as 20 field test participants; "And if .the 6rganization""rather than-" / 
the individual is the unit of analysis f the effective n may only be 4, riot -20,- 

Further, 'the determination of what is representative can be elusive. 
Which variables should be* described? For instance, with the high rate of 



* V 



administrative mobility/ does it make sense ,to*l>abel an^ indiyidual as an 7 
urban administrator > when he ijpy be a sub'urbari administrated £ext year? 

Acceptable cr iter ioh performance . When achievement te/t stores' are used 

as criteria, as they bf ten are in student curriculum evaluations^ acceptable 

' , . , /-'/'ft " ^ . '' 

performance is a significantly higher mean score for/the treatment group than 

.'"'■'/ ' ' • " / ' 

for the non-treatment. But administrative training products can rarely, em-, 

ploy an achievement score as an ipdex of acquired/skill ox 'training in,>£n 

• : • - > v. / / - v ' ' 

executive function* Rather, simulation or work sample .varieties of & performance 
testing are more^frequently appropriate. Scpres cn'th^se^ types". of performance ^ 
tests may be heavily dependent on expert judgment, ^idf is v oftea highly " 4 
variable across judges an^ across subjects The I6x^<recl*i^i5iti ^ty _and cojicomi- > 
tant large error of, measurement in x such, judgment scores- lumper ^tlip decisive-^ 

< /. ' ' 'V..-. J v^.V • ' '/-' 

, ness with which, the evaluator can ^atp t^hat one gr(^Lip outperfornfcd aM^her, 

Canfo^ndijig^ variables > ^^c^fouodij^ ancl/^ntervening variables .can v 
be reasonably, controlled in 5tu4en^ curHculum product ^valuations through w 
'* ; use of coiro^risoh groups thaf -experi^ce everyo\ing v the treatment ' groups do ~ 
// * except the treatment* W|th administrative p^ducts , an important source 

^ , of confduijaing is' the fact .that- adntiiiistratdrs ar^ N J>eing !! traihed u at. all* 



^ TKe. comparison ^o.up should' therefore ' 4 als<> be "trained ,\using a different 
traajvirig. product or .methoai Bttt 1( &s mentioned earlier, it is often difficult 
7 ''^'J^ .eiibu^ift9 secure a- treatment $a^e'ofVs let. v aione a^ \ 

'*•>,.■ '^ 'vVBOtf^tfeatiiferi^ who must~ifls,o be trained/ A second important source of 

i .' >7 ■ ; •■»»■ ! -i - • -vT ' "• - 

-r ^/Con*o«nidipgi3.S the' motivational- level of administrators, who .desire a long- ' 
— ,^^i>,_ — .,,^u ^u... J — 1 agency • ...Again, only rf.the 




/ 
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nbn- treatment group, is offered a similar opportunity to develop a working 

relationship with the development agency -can-tHe confounding be controlled 
. ' . ;''«.• / />'' " . . ." 

\ Intervening variabl-es . It may be reasonable $o expect that comparison 
•• ' • *' •„,. ' ' '•'/■/' . ..- '* ' 

groups; of third.-graders are 'expospd to/ similar, school ^experiences during the 

term. It is far less reasonable *to expect that, on-going professional experi- 

fences <3f school administrators are ^similar throughout the year. Some members 

of the? field test sample may attend a conference, others may not. Some may 

be involved in heated teacher contract negotiations , others may not. Some 

may benefit from expanding budgets,, others may be plagued by shrinking bud- 

gets. Each 'of.-th^se^yariables may affect .criterion performance in a signifi- 

> / ' . , S ,■ , 
cant but<unkridwii way. ' ' " 

- - .- V 

far,'' we "have tried to, point out some of the 'difficulties and in- 
herent limitations ;in evaluating the effectiveness of administrative training 
products. -Based on our review of three administrative training products and 
bne .student curriculum product, we have noted that: V * 

. 1- Administrative ^valuators must often specify product goals in 
• ' ; , almost behavioral terms, as there is seldom common understand : 

, \ " # - 1 ing and definition of more global administrative goals. 

2; While>dministrative evaluators may demonstrate pxpduct effective 
ness via user mastery. of content," such a measure may be inappro- 
priate for products designed to train new skills. 



• ♦ 



,3. Simulation and work samplfe varieties of performarice tests, which 
may be more valid measures of administrative product effective- 
ness., often depend on expert judgments thajt are unreliable, 
creating large errors df measurement • 

4. ' iRfork sanples, which are'moxe valid than simulations, are far 

more difficult %o control.; • 

5. Criterion measures are often idosyncratic to .the training pro- 
duct, hanpering. measurement of both users' pre- treatment level' 
of ability and non- treatment .groups 1 Igyel of ability. 
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re focus on organizational rather - than individual- performance 
makes, pre- vs. post- and treatment vs. rion- treatment compari- 
sons difficult. v 



.W>:^*P e » representative samples of school districts (and/or school 
$ • administrators). may v be difficult to identify and involve in 
• 1 » -J)' -field testing. ' , A 

\8; Control of. confoiiiding and intervening variables .is., .poor. 
In light q£ these difficulties, it may be unreasonable to expect that adminis 
trative- training products follow the -same evaluation strategy as student 
curriculum pVoduc.ts. It is hoped that there are forthcoming from 'the 
tional R§D coimiunity other strategies that promise to overcome some A these 
weaknesses in evaluating the effectiveness of administrative products 

A SUGGESTED PROCEDURE - / 




J 



? * / 

The. evaluation strategy, suggested here, is just that:/a suggestion, not 
a prescription. If followed, it may overcome some of ^^limitations of cur- <• 

rent evaluation ^designs. - / * 

\ > ' \ * / 

A logical starting point in product effectiveness evaluation'may he to 
1 >; t y / . ~. s/ 

.ask, 'Wiatacihd of claim do the xtevelopers and/or evaluators wish to make 're- - 
\\ \ 

garding the ''effectiveness of'tfte product? 11 Since the claim is dependent on 
the strategy 1 - used, the claim desired may have/ indications for the evaluation 

: . x *\ . / " ' 

requirements >-*e..g., the criterion measure /of effectiveness, the sanple size,' 

■ . < / . * \ C 

the evaluatiar^ setting, and so forth. 1 / 



dp For example, the. eyaluator of the curriculum evaluation product may pre- 

sent the, developers with the following list of, possible claims and asK them 
to rdnk the claims in order of preference:* * ' 

* a. T This product is effective in ^teaching school Administrators the 

4 principles of managing a curriculum evaluation project. 

v, . «. t # -15- 

^ • . • > 

'.J ! 



b. This product is effective in teaching school' administrators how 
to apply the principles of managirig a curriculum evaluation pro- 
ject to a hypothetical situation created by* the developer. 

y - . : 

c. .This product -is effective in teaching school administrators- how 

, to apply the principles of managing a curriculum evaluation pro- 
Vject to a hypothetical' situation in their own creation. 



d. Mfer proper conditions, this product may be effective in guid- 
Swgpsdiqoi administrators through certain activities in an on- 
going curriculum evaluation project. 

e. Under proper conditions^ this product may be effective in guid^ 
ihg school administrators through a complete curriculum* evalua-, 

. " > tion project* * 



Each claim suggests a different evaluktion strategy. Claim (a) calls for an 
achievement test, Claims (b) and (c) a simulation ,test of performance, and 
Claims (d)* and (e) a- work sample test of performance. The design for Claim / 
(a) could resent) le that of the traditional student curriculum product evalua- 
tion* A design for Claims (b) arid '(c) might! be as follows i randomly assign 
administrators from the same district to treatment and comparison groups. 
After training, both groups will be -asked to perform the same simulated 
management activity. If individuajs.^ork collaborately on the activity, then 
there will be a single performance' scdre fof the treatment and for the gjmpari- 
sen group.-* To employ analysis of variance, there probably should be a mini- 
num of five such districts in this design, so that the organizational n equals 
10 (5 treatment and 5 comparison) * Theouse of administrators from the same 
district should control, to a latge extent, confounding and intervening vari- 
ables. *It also pemits reasonable comparisons between "equivalent" organiza- 

* c 

trens. • . 

A similar design for Qaifrts (d) and (e) is possible, although it seems 
unlikely that two groups of administrators would ^actually work on the same 



management activity in a real situation. Therefore, in practice the requisite 
strategy for Claims (d) and (e) may comprise a work. sample test of . performance 
a single group of Specially-selected school 'districts, and'an anecdotal his- 
tory of how the product worked in. each district. Since the latter design 
•cannot be used, to support causal" relationships between treatment and effect,. 

• • • 

the resultant claim. is necessarily equivocajL ("Under proper conditions, the 
product way be effective'...."). At b.est, the evaluator may be able to suggest 
what conditions seem to be proper for effective product . use .. *Such equivocali- 
ty should, not be taken as a sign of low-calibre evaluation: rather, it shows 
that the developers have attempted to field test the product 'using .a more, 
ultimate criterion. . , 

t HoWever, it may not be best to select a single, most desirable claim, 
"foe .problem with choosing a single claim is that, because we -might not know 
the exact relationship between content mastery and content application in . 
managerial training, interpretation ^ the claim is difficult. Suppose we .' 
att&npt to^ collect evaluation data to' support Claim (b) and find that adminis- 
trators are. unable to demonstrate application of the principles taught. Is 
the product unsuccessful teaching the principles themselves, or is it fail- 
ing to help users translate the principles to a hypothetical application? ; We 
do.not know, unless we have evaluation data on both content mastery and con- 
sent application.- ..-*,« 

Therefore, an optimal evaluation procedure would be multi-stage, each ' 
stage focusing en one of the three criterion measures discussed earlier 
(achievement test, simulation, and work sample). Art-' evaluation report that 
provides, information on (1) how' well use^rs npster content, (2) how well users'-., 
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. apply £he extent; to a hypothetical -situatim,' and (3) what actually happens " ' 
in the school district would-tlearly be.' a cut above those now offered'. 



v / « » 
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